Image processing apparatus and image processing method which learn dictionary

ABSTRACT

An image processing apparatus includes a plurality of dictionaries configured to store a feature of an object and information on an imaging direction in a scene for each kind of imaged scene, a detecting unit configured to detect an object with reference to at least one of the plurality of dictionaries in the scene in which the object has been imaged and which is to be learned, an estimating unit configured to estimate the imaging direction the detected object, a selecting unit configured to select one dictionary from the plurality of dictionaries based on the imaging direction estimated by the estimating unit and the information on the imaging direction in each of the plurality of dictionaries, and a learning unit configured to learn the dictionary selected by the selecting unit, based on a detection result produced by the detecting unit.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to an image processing apparatus and imageprocessing method suitably used to learn dictionaries of a detector forhuman figures or the like in particular.

2. Description of the Related Art

Conventionally, a method has been proposed for detecting human figuresin an image taken by a camera (see, for example, Navneet Dalal and BillTriggs “Histograms of Oriented Gradients for Human Detection”,CVPR2005), According to a technique described in the document, adictionary of a detector is learned in advance through machine learningof human images and background images. Subsequently, the dictionary isused to identify whether or not a local image of the image received fromthe camera shows a human figure, and detect the human figure However, itis known that detection performance degrades if a photography scene andpersonal appearance of a human figure at the time of detection differfrom personal appearance at the time of preliminary learning.Specifically, the differences in the photography scene include adifference in lighting conditions, difference in a shooting directiondue to differences in an installation location and angle of the camera,the presence or absence of shade, a difference in the background, andthe like. On the other hand, the differences in personal appearanceinclude differences in orientation of the human figure and clothing.

Factors which degrade detection performance include the fact thatlearning samples collected at the time of preliminary learning cannotcover a diversity of photography scenes and personal appearances ofdetection objects. Thus, to solve this problem, a technique is proposedfor improving detection performance by conducting additional learning ofa preliminarily learned dictionary using learning samples for additionallearning collected in photography scenes similar to the photographyscene used at the time of detection. Japanese Patent ApplicationLaid-Open No. 2010-529529 proposes a method for creating a dictionaryfor a Real AdaBoost classifier through preliminary learning and thenadapting the dictionary to additional-learning samples further throughadditional learning.

However, with the method described In Japanese Patent ApplicationLaid-Open No. 2010-529529, when there are great differences in theinstallation angle of the camera, in attributes such as color, sex, andage of the human figures in the image, in the background, and the likebetween preliminary learning and additional learning, there is a greatdifference in feature quantity needed for identification, and thus thereis a limit on improvement of identification accuracy. Consider, forexample, a case in which directions and intensities of edges are used asa feature quantity for identification. If there is a difference in theinstallation angle of the camera with respect to a human figure betweenpreliminary learning and additional learning, the appearance positions,angles and intensities of the edges appearing in the image of the humanfigure vary. In such a case, the feature quantity of the detectionobject learned in preliminary learning is difficult to use in additionallearning, and thus there is a limit on performance improvement. Also,when there is a great difference in background texture betweenpreliminary learning and additional learning, there is similarly a greatdifference in the feature quantity needed for identification, and thusthere is a limit on performance improvement.

SUMMARY OF THE INVENTION

An object of the present invention is to enable precise additionallearning of a dictionary of a detector used in detecting an object.

According an aspect of the present invention, an image processingapparatus comprises: a plurality of dictionaries configured to storeinformation of a feature and an imaging direction of an object in scenesof imaging, per each kind of the scenes; a detecting unit configured todetect the object from the scene in which the object is imaged and issubjected to a learning, by reference to at least one of the pluralityof dictionaries; an estimating unit configured to estimate the imagingdirection of the object detected; a selecting unit configured to selecta dictionary from the plurality of dictionaries, based on the imagingdirection estimated by the estimating unit and the information of theimaging direction stored in each of the plurality of dictionaries; and alearning unit configured to perform a learning of the selecteddictionary based on a result of the detection by the detecting

Further features of the present invention will become apparent from thefollowing description of exemplary embodiments with reference to theattached drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram showing an exemplary functional configurationof an image recognition apparatus according to an embodiment.

FIG. 2 is a block diagram showing an exemplary hardware configuration ofthe image recognition apparatus according to the embodiment.

FIG. 3 is a flowchart showing an example of learning processingprocedures of the image recognition apparatus according to theembodiment.

FIG. 4 is a flowchart showing an example of detailed processingprocedures for calculating adaptability between dictionaries for a videoimage of a newly introduced scene and a video image of an existingscene.

FIGS. 5A and 5B are diagrams describing camera directions with respectto samples of a detection object,

FIG. 6 is diagram showing an example of a camera direction distributionfor a video image of a newly introduced scene or a video image of anexisting scene.

DESCRIPTION OF THE EMBODIMENTS

Preferred embodiments of the present invention will now be described indetail in accordance with the accompanying drawings.

FIG. 2 is a block diagram showing, an exemplary hardware configurationof an image recognition apparatus 100 according to an embodiment of thepresent invention.

In FIG. 2, there are plural imaging devices 201, each of which is animaging unit made up of a CCD, CMOS, or the like and configured toconvert an object image from light into electrical signal. A signalprocessing circuit 202 processes a time series signal of the objectimages obtained from the imaging devices 201 and converts the timeseries signal into a digital signal.

By executing a control program stored in a ROM 204, a CPU 203 controlsthe entire image recognition apparatus 100. The ROM 204 stores thecontrol program executed by the CPU 203 as well as various parameterdata. Here, being executed by the CPU 203, the control program causesthe apparatus to function as various units configured to performrespective processes shown in flowcharts described later. A RAM 205stores images and various information and functions as a work area forthe CPU 203 and a temporary save area for data. A display 206 is adisplay device used to display images and the like.

Note that although in the present embodiment, processes corresponding tosteps of the flowcharts described later are implemented by softwareusing the CPU 203, some or all of the processes may be implemented byhardware such as electronic circuits. Also, the image recognitionapparatus according to the present embodiment may be implemented using ageneral-purpose PC by omitting the imaging devices 201 and signalprocessing circuit 202 or implemented as a dedicated apparatus. Also,software (programs) acquired through a network or various storage mediamay be executed by a processing unit (CPU or processor) of a personalcomputer or the like.

FIG. 1 is a block diagram showing an exemplary functional configurationof the image recognition apparatus 100 according to the presentembodiment. As shown in FIG. 1, the image recognition apparatus 100according to the present embodiment includes a first image input unit101, a second image input unit 102, a sample extracting unit 103, alabel acquiring unit 104, a dictionary storage unit 105, a dictionaryadaptability estimating unit 106, a dictionary selecting unit 112 and anadditional learning unit 113.

The first image input unit 101 acquires a video image taken by a camera.The second image input unit 102 acquires video images taken by pluralcameras (hereinafter referred to as a group of other cameras) differentfrom the camera which has taken the video image input to the first imageinput unit 101. Although in the present embodiment, it is assumed thatthe plural video images acquired by the second image input unit 102 areto in different scenes, video images taken in a same scene may beincluded. The different scenes as referred to in the present embodimentare assumed to mean scenes differing in the installation location orangle of the camera, but may be scenes differing in other photographicconditions such as a lighting condition or object distance. Hereinafter,video image input to the first image input unit 101 will be referred toas a newly introduced scene video image while a video image input to thesecond image input unit 102 will be referred to as an existing scenevideo image.

The sample extracting unit 103 extracts samples by cutting local imagesfrom a newly introduced scene video image acquired by the first imageinput unit 101 and an existing scene video image acquired by the secondimage input unit 102. A label acquiring unit 104 adds a label to eachsample extracted by the sample extracting unit 103, the label indicatingthat the sample is a detection object or an object other than adetection object (background in the present embodiment). A method foradding a label will be described later.

The dictionary storage unit 105 stores object detector dictionarieslearned beforehand in plural existing scenes. In other words, thedictionary storage unit 105 stores plural object detector dictionarieslearned in the scenes taken by the group of other cameras. In thepresent embodiment, it is assumed that M dictionaries learned in Mscenes are stored. Also, it is assumed in the present embodiment that asa classifier dictionary learned in Real AdaBoost, each dictionary ismade up of plural lookup tables which make up a weak classifier. Notethat although it is assumed that the dictionaries belong to classifierslearned in Real AdaBoost, the dictionaries may be based on anotherlearning method or classifier. Furthermore, in creating a dictionary, adictionary which has already been learned be updated through additionallearning described in Japanese Patent Application Laid-Open No.2010-529529.

The dictionary adaptability estimating unit 106 estimates adaptabilityof dictionaries to a newly introduced scene using a sample acquired bythe sample extracting unit 103, a label acquired by the label acquiringunit 104, and plural dictionaries stored in the dictionary storage unit105. According to the present embodiment, the adaptability is areference index indicating similarity between a newly introduced sceneand an existing scene, but another reference index may be used. Thedictionary adaptability estimating unit 106 includes an object attributeestimating unit 107, an object attribute adaptability estimating unit108, an object sample adaptability estimating unit 109, a backgroundsample adaptability estimating unit 110 and an adaptability integratingunit 111.

The object attribute estimating unit 107 acquires an attribute of asample of the detection object. According to the present embodiment, theattribute is the camera direction with respect to the detection object,but may be the color, age, sex, or another attribute of the detectionobject. The camera direction will be described later. The objectattribute adaptability estimating unit 108 calculates the adaptabilityof dictionaries to a newly introduced scene using the attribute of thesample. The object sample adaptability estimating unit 109 calculatesthe adaptability of the dictionaries to newly introduced scene using thesample attribute, the sample of the detection object in the newlyintroduced scene, and the dictionaries stored in the dictionary storageunit 105.

The background sample adaptability estimating unit 110 calculates theadaptability of the dictionaries to the newly introduced scene using asample of the background of the newly introduced scene and thedictionaries stored in the dictionary storage unit 105. The adaptabilityintegrating unit 111 integrates the adaptability calculated by theobject attribute adaptability estimating unit 108, object sampleadaptability estimating unit 109 and background sample adaptabilityestimating unit 110.

The dictionary selecting unit 112 selects a suitable dictionary from thedictionaries stored in the dictionary storage unit 105 based on theadaptability calculated by the dictionary adaptability estimating unit106. The additional learning unit 113 updates the selected dictionaryusing a sample of the newly introduced scene.

Operation of each component shown in FIG. 1 will be described below withreference to a flowchart of FIG. 3.

FIG. 3 is a flowchart showing an example of learning processingprocedures of the image recognition apparatus 100 according to thepresent embodiment.

First, in step S301, the first image input unit 101 acquires a videoimage of a newly introduced scene.

Next, in step S302, the sample extracting unit 103 extracts samples of adetection object and samples of the background, which is a portion otherthan the object, from the video image of the newly introduced scene, andthe label acquiring unit 104 adds labels to the extracted samples.Regarding these samples, predetermined numbers of samples are extracted.

Here, each of samples is extracted by cutting a local image of anarbitrary size from an arbitrary location of the video image. In addinga detection object label to each sample extracted from the newlyintroduced scene video image, used is tracking-by-detection described byM. P. Breitenstein et al., in “Robust tracking-by-detection using adetector confidence particle filter”, ICCV2009. Specifically, first adetection process is performed by a detector. In so doing, to reducefalse detection and improve reliability of detection, only detectionresults whose likelihood is higher than a predetermined threshold areadopted, where the likelihood represents the degree to which an outputfrom the detector is likely to be a detection object. Next, objecttracking is performed in subsequent video image frames using thedetection result whose likelihood is higher than the threshold as aninitial detection result. This allows a label to be added to a sample ofa detection object which is difficult to detect using a detector alone.

Although the present embodiment uses tracking-by-detection in this wayto add a detection object label, alternatively a label may be addedmanually by a user via a control panel (not shown). Also an initialdetection result may be entered manually by the user via the controlpanel (not shown) and a label may be added by object tracking.

Next, in adding a background label to an extracted sample, thebackground label is added to a sample extracted from a video image framewhich does not contain any detection object. In the present embodiment,the background label is added to a sample of any size extracted at anyposition coordinates, but the background label may alternatively beadded only to hard negative samples which are background samples hard toidentify. That is, the background label may be added only to localimages whose likelihood of being a detection object is higher than apredetermined value. In this way, the use of only hard negative samplesoffers the effect of efficiently selecting samples prone to cause falseidentification from a set of many background samples.

As described above, to add the background label, the sample cut from avideo image frame containing no detection object is processed.Alternatively, a sample extracted from the area of the video imageexcluding the detection object area may be processed, using movingobject detection based on background subtraction, tracking-by-detectiondescribed above, or the like.

In the loop starting from step S303, the processes of step S304 to stepS306 are repeated for each of M existing scene video images.

First in step S304, the second image input unit 102 acquires an existingscene video image from one camera in the group of other cameras. Then instep S305, the sample extracting unit 103 extracts a sample of thebackground from the existing scene video image and the label acquiringunit 104 adds a label to the extracted sample of the background. Thisprocess is performed in a manner similar to step S302.

Next, in step S306, the dictionary adaptability estimating unit 106calculates the adaptability between dictionaries for a newly introducedscene video image and the existing scene video image. Note that adetailed process of this step will be described later. In this way, theprocesses of step S304 to step S306 are repeated for each existing scenevideo image.

Next, in step S307, based on the adaptability of dictionaries to eachexisting scene video image calculated in the loop of step S303, thedictionary selecting unit 112 selects a dictionary suitable for updatingfrom among the dictionaries stored in the dictionary storage unit 105.Although in the present embodiment, the dictionary with the highestadaptability is selected, another method may be used for the selection.

Then, in step S308, using the sample extracted in step S302, theadditional learning unit 113 additionally learns and updates thedictionary selected in step S307. As a method of additional learning,the present embodiment uses a technique described in Japanese PatentApplication Laid-Open No. 2010-529529. Specifically, values of lookuptables which make up a Real AdaBoost weak classifier are updated using apositive sample and negative sample. Note that the method of additionallearning is not limited to this method, and another method may be used.

Next, details of the process of step S306 performed by the dictionaryadaptability estimating unit 106 will be described with reference toFIG. 4.

FIG. 4 is a flowchart showing an example of detailed processingprocedures in step S306 of FIG. 3.

First, in step S401, the object attribute estimating unit 107 acquiresthe attribute of samples of the detection object in the newly introducedscene video image and existing scene video image, i.e., the cameradirection with respect to the samples of the detection object. In thepresent embodiment, as shown in FIG. 5A, the camera direction isclassified into any of 0-degree, 30-degree, 45-degree, and 60-degreedirections in terms of elevation angle.

To acquire the camera directions of samples, detectors configured todetect objects only in specific directions are prepared in advance. Forexample, as shown in FIG. 5B, detectors are prepared which have learnedusing detection object sample groups obtained by photographing detectionobjects in camera directions of 0 degrees, 30 degrees, 45 degrees, and60 degrees, respectively, in terms of elevation angle. Then, thedetectors in respective camera directions are applied to each sample andthe direction corresponding to the detector which has output the highestlikelihood is established as the camera direction of the sample.

Although in the present embodiment, detectors configured to detectobjects only in specific camera directions are prepared in advance,direction classifiers for detection objects may be prepared by anothermethod. Also, the direction of the detection object may be enteredmanually by the user via the control panel (not shown), acquired byexternal sensors configured to acquire the position of the detectionobject, acquired from prior knowledge about the installation location orangle of the camera, or the like. Also, if detectors of the existingscene video image are designed to be able to output not only thelikelihood of the detection object, but also the direction of thedetection object, the detectors may be used instead. Also, although theangular direction of the elevation angle of the camera is used in thepresent embodiment, the yaw angle, pitch angle, roll angle, or the likeof the detection object or combination thereof may be used.

Next, in step S402, based on the camera directions of the detectionobject samples in the newly introduced scene video image and existingscene video image acquired in step S401, the object attributeadaptability estimating unit 108 creates direction distributions ofrespective scenes. Then, by calculating dissimilarity between thedirection distributions, the object attribute adaptability estimatingunit 108 calculates adaptability.

In step S402, first a sample direction distribution of the detectionobject in each scene is created as shown in FIG. 6. This process isperformed by counting the number of samples in each camera directionbased on the camera directions acquired in step S401. Next, thedirection distributions are compared between the newly introduced scenevideo image and existing scene video image. Although KL divergence isused in the present embodiment for the comparison of directiondistributions, another reference index such as histogram intersection orEarth Mover's Distance may be used. A value opposite in sign to theresult of comparison between the distributions is established asadaptability S_(dist) between direction distributions between thescenes. In this way, as the dissimilarity between directiondistributions is taken into account in determining the adaptabilitybetween scenes, an existing scene video image similar in occurrencetendency of the detection object direction comes to be readily selected.Consequently, an existing scene video image similar in the installationlocation and angle of the camera comes to be readily selected.

In the present embodiment, the object attribute adaptability estimatingunit 108 estimates adaptability based on the distribution of cameradirections with respect to the detection object as with step S402.However, the adaptability may be estimated based on the camera directionwith respect to the detection object by another method without using adistribution. For example, average values of direction angles may becalculated separately for a newly introduced scene video image andexisting scene video image, and adaptability may be estimated bycomparing the calculated values.

Next, in step S403, the object sample adaptability estimating unit 109calculates the adaptability of detection object samples. First, thedetector of the existing scene video image outputs the likelihood whichrepresents the degree to which a given sample is likely to be a sampleof the detection object. A high likelihood of being a detection objectsample means that the detector is properly suited to the sample. Thus,an average value of the likelihood is used in calculating theadaptability of a detection object sample. Here, if X_(pos) is adetection object sample group in an existing scene video image and|X_(pos)| is the number of X_(pos)'s and x is a sample in X_(pos) andH(x) is the likelihood output by the detector, then the adaptabilityS_(pos) of the detection object sample is given by Eq. (1) below.

$\begin{matrix}{S_{pos} = {\frac{1}{X_{pos}}{\sum\limits_{x \in X_{pos}}{H(x)}}}} & (1)\end{matrix}$

In calculating the adaptability S_(pos) of the detection object sample,the camera direction with respect to the detection object sample is notallowed for by Eq. (1), but may be taken into consideration. Forexample, using Eqs. (2) and (3) below instead of Eq. (1), theadaptability S_(pos) of the detection object sample may be found bycalculating average likelihood T (X_(pos) ^(d)) for each cameradirection and then further averaging the average likelihood values.

$\begin{matrix}{S_{pos} = {\frac{1}{D}{\sum\limits_{d \in D}{T\left( X_{pos}^{d} \right)}}}} & (2) \\{{T\left( X_{pos}^{d} \right)} = {\frac{1}{X_{pos}^{d}}{\sum\limits_{x \in X_{pos}^{d}}{H(x)}}}} & (3)\end{matrix}$

where D is a set of directions, |D| is the number of directions, d is adirection, and X_(pos) ^(d) is a sample group of a detection object inan existing scene video image having the direction d.

Next, in step S404, the background sample adaptability estimating unit110 calculates the adaptability of background samples. Contrary to stepS403, the lower the likelihood which represents the degree to which agiven sample is likely to be a sample of the detection object, thehigher the degree to which the sample belongs to the background.Consequently, a low likelihood of being a background sample means thatthe detector is properly suited to the sample. Thus, in calculating theadaptability of a background sample, a value opposite in sign to theaverage value of likelihood is used. Here, if X_(neg) is a backgroundsample group in an existing scene and |X_(neg)| is the number ofX_(neg)'s and x is a sample in X_(neg) and H(x) is the likelihood outputby the detector, then the adaptability S_(neg) of the background sampleis given by Eq. (4) next.

$\begin{matrix}{S_{neg} = {{- \frac{1}{X_{neg}}}{\sum\limits_{x \in X_{neg}}{H(x)}}}} & (4)\end{matrix}$

Next, in step S405, the adaptability integrating unit 111 integratesthree types of adaptability calculated in step S402 to step S404 andcalculates final adaptability between the newly introduced scene videoimage and existing scene video image. In the present embodiment, alinear sum of the three types of adaptability is established as thefinal adaptability. If S_(dist) is the adaptability between the scenesand S_(pos) is the adaptability of the detection object sample andS_(neg) is the adaptability of the background sample, then the finaladaptability is given by Eq. (5) next.

S=λ _(dist) S _(dist)+λ_(pos) S _(pos)+λ_(neg) S _(neg)   (5)

where λ_(dist), λ_(pos) and λ_(neg) are weighting factors set inadvance. Although the present embodiment uses the adaptability S byintegrating the adaptability S_(dist) between the directiondistributions of the scenes, adaptability S_(pos) of the detectionobject sample, and adaptability S_(neg) of the background sample, theadaptability S may be established by integrating at least any one typeof adaptability. In that case, only necessary adaptability may becalculated out of S_(dist), S_(pos), and S_(neg).

As described above, according to the present embodiment, additionallearning is conducted based on a detector of an existing scene videoimage highly compatible with a newly introduced scene video image. Thisallows a highly compatible feature quantity used in preliminary learningto be used in additional learning and thereby enables improvement ofidentification accuracy. Also, a repetition process of the additionallearning can be started with a suitable initial value, therebyfacilitating convergence of the repetition process and reducingcalculation cost for the additional learning. For example, when a camerais installed in a new environment, if learning is conducted based on adetector of an existing scene video image and the detector has gonethrough learning in another similar location, improved identificationaccuracy and faster learning can be expected.

Also, the adaptability of the detection object, adaptability of thebackground object, and adaptability between the direction distributionsof the detection object are used to select a detector dictionary usedfor additional learning. Since the use of the adaptability of thedetection object allows the use of a feature quantity which enablesproper identification of the detection object obtained by preliminarylearning, identification accuracy can be improved. Also, since the useof the adaptability of the background object similarly allows the use ofa feature quantity which enables proper identification of the backgroundobtained by preliminary learning, identification accuracy can beimproved. Furthermore, the use of the adaptability between the directiondistributions of the detection object, i.e., the use of the adaptabilityof a detection object attribute, allows selection of an existing scenevideo image similar in occurrence tendency of the detection objectattribute to the newly introduced scene video image. Thus, improvedreliability of selection and consequent improvement of identificationaccuracy can be expected.

Other Embodiments

Embodiment(s) of the present invention can also be realized by acomputer of a system or apparatus that reads out and executes computerexecutable instructions (e.g., one or more programs) recorded on astorage medium (which may also be referred to more fully as a‘non-transitory computer-readable storage medium’) to perform thefunctions of one or more of the above-described embodiment(s) and/orthat includes one or more circuits (e.g., application specificintegrated circuit (ASIC)) for performing the functions of one or moreof the above-described embodiment(s), and by a method performed by thecomputer of the system or apparatus by, for example, reading out andexecuting the computer executable instructions from the storage mediumto perform the functions of one more of the above-describedembodiment(s) and/or controlling the one or more circuits to perform thefunctions of one or more of the above-described embodiment(s) Thecomputer may comprise one or more processors (e.g., central processingunit (CPU), micro processing unit (MPU)) and may include network ofseparate computers or separate processors to read out and execute thecomputer executable instructions. The computer executable instructionsmay be provided to the computer, for example, from a network or thestorage medium. The storage medium may include, for example, one or moreof a hard disk, a random-access memory (RAM), a read only memory (ROM),a storage of distributed computing systems, an optical disk (such as acompact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)™),a flash memory device, a memory card, and the like.

While the present invention has been described with reference toexemplary embodiments, it is to be understood that the invention is notlimited to the disclosed exemplary embodiments. The scope of thefollowing claims is to be accorded the broadest interpretation so as toencompass all such modifications and equivalent structures andfunctions.

This application claims the benefit of Japanese Patent Application No.2014-137149, filed Jul. 2, 2014, which is hereby incorporated byreference herein in its entirety.

What is claimed is:
 1. An image processing apparatus comprising: aplurality of dictionaries configured to store information of a featureand an imaging direction of an object in scenes of imaging, per eachkind of the scenes; a detecting unit configured to detect the objectfrom the scene in which the object is imaged and is subjected to alearning, by reference to at least one of the plurality of dictionaries;an estimating unit configured to estimate the imaging direction of theobject detected; a selecting unit configured to select a dictionary fromthe plurality of dictionaries, based on the imaging direction estimatedby the estimating unit and the information of the imaging directionstored in each of the plurality of dictionaries; and a learning unitconfigured to perform a learning of the selected dictionary based on aresult of the detection by the detecting unit.
 2. The image processingapparatus according to claim 1, wherein the selecting unit derives adirection adaptability of each of the plurality of dictionaries based onthe imaging direction estimated by the estimating unit and theinformation of the imaging direction stored in each of the plurality ofdictionaries, and selects one from the plurality of dictionaries basedon the direction adaptability.
 3. The image processing apparatusaccording to claim 2, wherein the selecting unit derives the directionadaptability of each of the plurality of dictionaries, based ondistribution information of the imaging directions estimated anddistribution information of the imaging direction of which informationis stored in each of the plurality of dictionaries.
 4. The imageprocessing apparatus according to claim 2, wherein the selecting unitderives an object adaptability of each of the plurality of dictionariesbased on a feature of the object detected and the information of thefeature stored in each of the plurality of dictionaries, and selects onefrom the plurality of dictionaries based on the object adaptability andthe direction adaptability of each of the plurality of dictionaries. 5.The image processing apparatus according to claim 2, wherein the each ofthe plurality of dictionaries stores a feature of a background, and theselecting unit derives a background adaptability of each of theplurality of dictionaries based on a feature of the background exceptfor the object detected and the feature of the background stored in eachof the plurality of dictionaries, and selects one from the plurality ofdictionaries based on the background adaptability and the directionadaptability of each of the plurality of dictionaries.
 6. An imageprocessing method comprising: detecting an object from a scene in whichthe object is imaged and is subjected to a learning, by reference to atleast one of a plurality of dictionaries configured to store informationof a feature and an imaging direction of the object in the scene ofimaging, per each kind of the scenes; estimating the imaging directionof the object detected; selecting a dictionary from the plurality ofdictionaries, based on the imaging direction estimated and theinformation of the imaging direction stored in each of the plurality ofdictionaries; and performing a learning of the selected dictionary basedon a result of the detection.
 7. A non-transitory computer-readablerecording medium storing a readable program for operating a computer toexecute the image processing method according to claim 6.